Release v0.6.0: MMC scoring fix and chart-based W&B diagnostics#37
Merged
Conversation
Score MMC on the validation split where meta_model.parquet has ids, merge those metrics into HPO trials, and replace raw diagnostics tables with matplotlib bar/heatmap charts. Add MultiTarget model access helpers, live W&B training logs, and regression tests for MMC and W&B logging. Co-authored-by: Cursor <cursoragent@cursor.com>
Bump version, document validation MMC scoring and matplotlib diagnostics in README/CHANGELOG, and stabilize matplotlib state between tests. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop metric aliases, legacy meta-model column fallbacks, model type aliases, and W&B param fallbacks. Delete flaky or duplicate tests and consolidate W&B diagnostics coverage into multitarget tests. Co-authored-by: Cursor <cursoragent@cursor.com>
Use Optuna-backed local HPO with persisted study state, keep payout as default objective, and improve routing diagnostics with feature-group metadata in WandB logs. Co-authored-by: Cursor <cursoragent@cursor.com>
Multi-target/multi-head pipelines stripped era from X without passing era_train to nested models, breaking Packboost in multi_blend trials. Co-authored-by: Cursor <cursoragent@cursor.com>
HPO already optimizes payout_score but the console/JSON leaderboard still sorted and displayed corr_sharpe; align ranking, columns, and EDA view. Co-authored-by: Cursor <cursoragent@cursor.com>
Add meta-neutralization, bounded ensemble optimization, MMC validation loading, payout_score objective with fast holdout, and supporting tests for the MMC plan. Co-authored-by: Cursor <cursoragent@cursor.com>
…ls to 25 Reduce dead Optuna dimensions by suggesting model-specific and routing params only when active, and extend TPE startup exploration before Bayesian optimization. Co-authored-by: Cursor <cursoragent@cursor.com>
Penalize validation-only overfit in leaderboard/best_config selection, expose val vs holdout metrics in trial logs, allow --max-models 3 in fast mode, and warn when resume switches eval mode. Co-authored-by: Cursor <cursoragent@cursor.com>
Save the full post-worker flat config to TrialDB (minus runtime keys), wire meta neutralization through to_numerai_predict via numerai_meta_model, and apply single-model lane preprocessors during feature routing. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
load_mmc_validation_framealignsvalidation.parquetwithmeta_model.parquetso W&Bmetric/mmcis no longer null; HPO mergesmmc,mmc_sharpe, andpayout_scoreafter train-era holdout evaluation.--localsearch now uses OptunaTPESampler(multivariate) instead of random sampling; study is persisted inoptuna.dbalongsidetrials.db;--sampler randomflag available to fall back to random search.payout_scoreas default objective: HPO now optimises0.75 * corr_sharpe + 2.25 * mmc_sharpeby default instead ofcorr_sharpe.MAX_ROUTED_FEATURES = 1000hard cap prevents OOM;active_groups,active_groups_count, androuted_feature_countare logged per trial in W&B and the summary table.diagnostics/tables replaced with matplotlib bar charts, correlation heatmaps, and line charts; NaN metrics are skipped in trial logging.wandb_logging.pybridges loguru to the W&B Logs panel and logs per-round XGBoost metrics during training.pipeline/model_access.pyprovidesiter_trained_models,model_prediction_map, andmultitarget_blend_weightsfor SHAP and ensemble diagnostics.Test plan
make fmt— passesmake types— passes (283 source files, no errors)make test— 283 passed, 3 skippedOptuna sampler: tpe) and feature groups appear in log (groups=..., features=...)